\section{Conclusions}
\label{sec:conclusions}

In this paper, we considered the problem of performing the Kolmogorov-Smirnov test on streaming data. We gave algorithms for both the one-sample and the two-sample versions of the test, along with guarantees of their performance. Our algorithms make use of the techniques for computing quantiles on a stream and hence may have improved results when further progress is made on this problem. Besides giving analytical guarantees, we give a near-tight lower bound. Moreover, we show via experiments on real and synthetic data that the proposed algorithms are capable of performing the test with a two order magnitude reduction in the size of the data. Our experiments also showed that the Greenwald-Khanna sketch is best suited for our algorithm, and that it is considerably superior to other simple techniques such as sampling.

There are several open problems that still remain. The algorithms proposed in this paper need $O(\sqrt{n}\log{n})$ samples, and it would be interesting to close the gap with the $\Omega(\sqrt{n})$ lower bound. While the algorithm proposed in this paper makes use of quantile sketches, and hence has the same space usage as them, it is unclear whether a testing algorithm exists that uses asymptotically less space than any quantile sketch. It would be interesting to find such an algorithm, or alternatively to prove that any quantile sketch must use as much space as any testing algorithm. From the experimental side, it would be interesting to design other quantile algorithms that are targeted towards improving the accuracy of the result of the KS test algorithm in this paper. 

There are also many other statistical tests, both parametric and non-parametric, that do not have known streaming algorithms. Identifying which ones have sublinear space algorithms and developing these algorithms is another avenue for future work.


%\noindent{\bf Acknowledgement:} We thank anonymous reviewers for
%very helpful comments on the paper. And Erin. And Ryan.
